37 research outputs found

    BMC Evol. Biol.

    Get PDF
    Background: Insertions and deletions of DNA segments (indels) are together with substitutions the major mutational processes that generate genetic variation. Here we focus on recent DNA insertions and deletions in protein coding regions of the human genome to investigate selective constraints on indels in protein evolution. Results: Frequencies of inserted and deleted amino acids differ from background amino acid frequencies in the human proteome. Small amino acids are overrepresented, while hydrophobic, aliphatic and aromatic amino acids are strongly suppressed. Indels are found to be preferentially located in protein regions that do not form important structural domains. Amino acid insertion and deletion rates in genes associated with elementary biochemical reactions (e. g. catalytic activity, ligase activity, electron transport, or catabolic process) are lower compared to those in other genes and are therefore subject to stronger purifying selection. Conclusion: Our analysis indicates that indels in human protein coding regions are subject to distinct levels of selective pressure with regard to their structural impact on the amino acid sequence, as well as to general properties of the genes they are located in. These findings confirm that many commonly accepted characteristics of selective constraints for substitutions are also valid for amino acid insertions and deletions

    PhyloSim - Monte Carlo simulation of sequence evolution in the R statistical computing environment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Monte Carlo simulation of sequence evolution is routinely used to assess the performance of phylogenetic inference methods and sequence alignment algorithms. Progress in the field of molecular evolution fuels the need for more realistic and hence more complex simulations, adapted to particular situations, yet current software makes unreasonable assumptions such as homogeneous substitution dynamics or a uniform distribution of indels across the simulated sequences. This calls for an extensible simulation framework written in a high-level functional language, offering new functionality and making it easy to incorporate further complexity.</p> <p>Results</p> <p><monospace>PhyloSim</monospace> is an extensible framework for the Monte Carlo simulation of sequence evolution, written in R, using the Gillespie algorithm to integrate the actions of many concurrent processes such as substitutions, insertions and deletions. Uniquely among sequence simulation tools, <monospace>PhyloSim</monospace> can simulate arbitrarily complex patterns of rate variation and multiple indel processes, and allows for the incorporation of selective constraints on indel events. User-defined complex patterns of mutation and selection can be easily integrated into simulations, allowing <monospace>PhyloSim</monospace> to be adapted to specific needs.</p> <p>Conclusions</p> <p>Close integration with <monospace>R</monospace> and the wide range of features implemented offer unmatched flexibility, making it possible to simulate sequence evolution under a wide range of realistic settings. We believe that <monospace>PhyloSim</monospace> will be useful to future studies involving simulated alignments.</p

    Accurate reconstruction of insertion-deletion histories by statistical phylogenetics

    Get PDF
    The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.Comment: 28 pages, 15 figures. arXiv admin note: text overlap with arXiv:1103.434

    The evolution of the human genome: insertions and deletions in protein coding regions

    No full text

    Evolutionary dynamics of eukaryotic transposable elements

    Full text link
    Transposable elements (TEs) are DNA sequences that have the ability to replicate within a genome using a variety of mechanisms. They are present in almost all eukaryotic genomes, and they play an important role in genome evolution by creating genetic variation through their mobility. TEs can be divided into two classes (I and II) based on their replication mechanism. Class I elements use an RNA intermediate for transposition and are called retrotransposons. They can be further subdivided into LTR and non-LTR elements, named after the presence or absence of long terminal repeat (LTR) sequences in the element. Class II elements use a DNA intermediate and are therefore called DNA transposons. The evolutionary factors that influence transposable element abundance have received considerable attention but are still not fully understood. In my thesis I have focused on three different projects concerning the evolutionary dynamics of TEs. In my first project I analyzed the evolutionary dynamics of the LTR family roo and it's highly diverged relative rooA in 12 closely related Drosophila species. Roo is the most abundant retrotransposon in the fruit fly Drosophila melanogaster. Its evolutionary origins and dynamics are thus of special interest for understanding the evolutionary history of Drosophila genome organization. Within the 12 genomes I found a broad spectrum for the evolutionary dynamics of roo and rooA ranging from recent intense transpositional activity to slow decay and extinction. Furthermore I suggest an origin of roo/rooA within the Drosophila clade based on the balance of phylogenetic evidence, sequence divergence distribution, and the occurrence of solo-LTR elements. The second project in my thesis regards the BEL/Pao subclass of LTR retrotransposons. LTR elements are the most abundant class of TEs. In contrast to the other subclasses of LTR elements, little attention has been paid to elements belonging to the metazoan BEL/Pao subclass. I therefore searched for all BEL/Pao elements in a set of 62 metazoan genomes and analyzed their evolutionary history. My work shows that BEL/Pao elements are the second most abundant class of LTR retrotransposons in metazoan species and are therefore much more frequent than I previously thought. Furthermore, I identified two novel BEL/Pao superfamilies. The presence of BEL/Pao elements in both metazoan kingdoms suggest that they arose during early metazoan evolution. In my third project I studied the in uence of the mating system on TE dynamics in the predominantly selfing plant A. thaliana and its close outcrossing relative A. lyrata. There are two opposing hypotheses that make predictions regarding the impact of selfing and outcrossing on TE abundance. The first predicts a higher TE number in a selfing species due to less ectopic recombination that eliminates TE copies. The second predicts a lower TE number in selfing species, because TEs can spread more easily in outcrossing species through recombination. I conducted the �first genome-scale analysis regarding the influence of the mating system on TE abundance. I identified more than three times more TE copies in the outcrossing A. lyrata than in the selfing A. thaliana, as well as ten times more TE families unique to A. lyrata. On average, elements in A. lyrata are younger than elements in A. thaliana. In particular, A. thaliana shows a marked decrease in element number starting approximately 0.5 million years ago, around the time predominant selfing originated. My observations suggest that if the mating system is an important factor in determining TE copy numbers, then selfing species are likely to have fewer transposable elements

    5 Jahre klinische Erfahrungen mit ösophagopharyngealer Druckmessung

    No full text

    BEL/Pao retrotransposons in metazoan genomes

    Get PDF
    Background Long terminal repeat (LTR) retrotransposons are a widespread kind of transposable element present in eukaryotic genomes. They are a major factor in genome evolution due to their ability to create large scale mutations and genome rearrangements. Compared to other transposable elements, little attention has been paid to elements belonging to the metazoan BEL/Pao subclass of LTR retrotransposons. No comprehensive characterization of these elements is available so far. The aim of this study was to describe all BEL/Pao elements in a set of 62 sequenced metazoan genomes, and to analyze their phylogenetic relationship. Results We identified a total of 7,861 BEL/Pao elements in 53 of our 62 study genomes. We identified BEL/Pao elements in 20 genomes where such elements had not been found so far. Our analysis shows that BEL/Pao elements are the second-most abundant class of LTR retrotransposons in the genomes we study, more abundant than Ty1/Copia elements, and second only to Ty3/Gypsy elements. They occur in multiple phyla, including basal metazoan phyla, suggesting that BEL/Pao elements arose early in animal evolution. We confirm findings from previous studies that BEL/Pao elements do not occur in mammals. The elements we found can be grouped into more than 1725 families, 1623 of which are new, previously unknown families. These families fall into seven superfamilies, only five of which have been characterized so far. One new superfamily is a major subdivision of the Pao superfamily which we propose to call Dan, because it is restricted to the genome of the zebrafish Danio rerio. The other new superfamily comprises 83 elements and is restricted to lower aquatic eumetazoans. We propose to call this superfamily Flow. BEL/Pao elements do not show any signs of recent horizontal gene transfer between distantly related species. Conclusions In sum, our analysis identifies thousands of new BEL/Pao elements and provides new insights into their distribution, abundance, and evolution
    corecore